264 research outputs found

    Unsupervised Integration of Multiple Protein Disorder Predictors: The Method and Evaluation on CASP7, CASP8 and CASP9 Data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Studies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder have been developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders.</p> <p>Results</p> <p>To more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at CASP7, CASP8, and CASP9 have shown the effectiveness of the proposed algorithm.</p> <p>Conclusions</p> <p>The proposed algorithm can potentially be used to predict protein disorder and provide helpful suggestions on choosing suitable disorder predictors for unknown protein sequences.</p

    Intrinsic disorder in putative protein sequences

    Get PDF
    Abstract — Intrinsically disordered proteins perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving perresidue accuracies over 80%. In a genome-wide study we observed big difference in predicted disorder content between confirmed and putative human proteins, and suspected that this is due to large errors introduced by gene-finding algorithms for putative sequence annotation. To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. Its application to putative human protein sequences shows that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. Our finding provides first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates are biased

    Systematic Framework for Integration of Weather Data into Prediction Models for the Electric Grid Outage and Asset Management Applications

    Get PDF
    This paper describes a Weather Impact Model (WIM) capable of serving a variety of predictive applications ranging from real-time operation and day-ahead operation planning, to asset and outage management. The proposed model is capable of combining various weather parameters into different weather impact features of interest to a specific application. This work focuses on the development of a universal weather impacts model based on the logistic regression embedded in a Geographic Information System (GIS). It is capable of merging massive data sets from historical outage and weather data, to real-time weather forecast and network monitoring measurements, into a feature known as weather hazard probability. The examples of the outage and asset management applications are used to illustrate the model capabilities

    Prediction of Solar Radiation Based on Spatial and Temporal Embeddings for Solar Generation Forecast

    Get PDF
    A novel method is proposed for real-time solar generation forecast using weather data, while exploiting both spatial and temporal structural dependencies. The network observed over time is projected to a lower-dimensional representation where a variety of weather measurements are used to train a structured regression model while weather forecast is used at the inference stage. Experiments were conducted at 288 locations in the San Antonio, TX area on obtained from the National Solar Radiation Database. The model predicts solar irradiance with a good accuracy (R2 0.91 for the summer, 0.85 for the winter, and 0.89 for the global model). The best accuracy was obtained by the Random Forest Regressor. Multiple additional experiments were conducted to characterize influence of missing data and different time horizons providing evidence that the new algorithm is robust for data missing not only completely at random but also when the mechanism is spatial, and temporal
    corecore